Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detailed consensus data correlated together. #5302

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

mtrippled
Copy link
Collaborator

@mtrippled mtrippled commented Feb 20, 2025

Combine multiple related debug log data points into a single
message. Allows quick correlation of events that
previously were either not logged or, if logged, strewn across multiple lines, making correlation difficult. The Heartbeat Timer and consensus ledger accept processing each have this capability.

Also guarantees that log entries will be written if the node is a validator, regardless of log severity level. Otherwise, the level of these messages is at INFO severity.

High Level Overview of Change

Enable logging of multiple data points so that they are emitted to the debug log at the same time. Validators will have these messages always emitted, regardless
of other configuration of debug log severity level.

Context of Change

The context is to facilitate data collection and troubleshooting. Currently, this much
of the data in the PR is not being logged. And for that which is being logged, it's
written all over the debug log on multiple lines. This makes it difficult to correlate
consensus-related events.

And because this change emits the logs automatically for validators, no special configuration of severity level needs to be made.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (non-breaking change that only restructures code)
  • Performance (increase or change in throughput and/or latency)
  • Tests (you added tests for code that already exists, or your new feature included in this PR)
  • Documentation update
  • Chore (no impact to binary, e.g. .gitignore, formatting, dropping support for older tooling)
  • Release

API Impact

  • Public API: New feature (new methods and/or new fields)
  • Public API: Breaking change (in general, breaking changes should only impact the next api_version)
  • libxrpl change (any change that may affect libxrpl or dependents of libxrpl)
  • Peer protocol change (must be backward compatible or bump the peer protocol version)

New, verbose log entries are created. They can be grepped with the string "ConsensusLogger". They are emitted approximately every second, with an additional entry for each consensus "accept" phase. It increases logging requirements. Estimated 5MB/hr.

To see the log messages, grep for:
"ConsensusLogger: Heartbeat Timer" and "ConsensusLogger: onAccept"

If configured as a validator, all debug log severity levels will cause these messages to be emitted (though I'm not sure if kDisabled is over-ridden). To test, configure at warning, error, or fatal, and the log messages should still be emitted. They'll still say they're NFO (info) level, but they bypass the filter. If not configured as a validator, verify that the messages only appear for "info" level or more verbose.

Before committing to this, the community should be informed of the new feature since it will require more disk space, at least for validator operators or people with very terse logging configured. However, the disk requirements are not very large: roughly 5MB per hour. This is < 1GB/week, which is quite small for the benefit.

@mtrippled
Copy link
Collaborator Author

Rebased against a more recent commit.

@mtrippled mtrippled reopened this Feb 20, 2025
Copy link

codecov bot commented Feb 20, 2025

Codecov Report

Attention: Patch coverage is 52.41935% with 118 lines in your changes missing coverage. Please review.

Project coverage is 78.1%. Comparing base (844646d) to head (257a2de).
Report is 3 commits behind head on develop.

Files with missing lines Patch % Lines
src/xrpld/consensus/Consensus.h 75.8% 31 Missing ⚠️
src/xrpld/app/consensus/RCLConsensus.cpp 3.4% 28 Missing ⚠️
src/xrpld/app/misc/NetworkOPs.cpp 6.7% 28 Missing ⚠️
src/xrpld/consensus/Consensus.cpp 54.7% 24 Missing ⚠️
include/xrpl/beast/utility/WrappedSink.h 0.0% 2 Missing ⚠️
src/libxrpl/basics/Log.cpp 0.0% 2 Missing ⚠️
src/xrpld/app/consensus/RCLConsensus.h 0.0% 2 Missing ⚠️
src/libxrpl/beast/utility/src/beast_Journal.cpp 0.0% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           develop   #5302     +/-   ##
=========================================
+ Coverage     77.9%   78.1%   +0.2%     
=========================================
  Files          791     790      -1     
  Lines        68006   67910     -96     
  Branches      8346    8227    -119     
=========================================
+ Hits         52967   53028     +61     
+ Misses       15039   14882    -157     
Files with missing lines Coverage Δ
include/xrpl/basics/Log.h 71.4% <ø> (ø)
include/xrpl/beast/utility/Journal.h 96.5% <ø> (ø)
src/xrpld/app/main/Application.cpp 69.3% <100.0%> (ø)
src/xrpld/app/misc/NetworkOPs.h 100.0% <ø> (ø)
src/libxrpl/beast/utility/src/beast_Journal.cpp 58.0% <0.0%> (-1.2%) ⬇️
include/xrpl/beast/utility/WrappedSink.h 52.4% <0.0%> (-5.5%) ⬇️
src/libxrpl/basics/Log.cpp 18.4% <0.0%> (-0.2%) ⬇️
src/xrpld/app/consensus/RCLConsensus.h 84.0% <0.0%> (-7.3%) ⬇️
src/xrpld/consensus/Consensus.cpp 74.0% <54.7%> (-24.4%) ⬇️
src/xrpld/app/consensus/RCLConsensus.cpp 62.2% <3.4%> (-3.2%) ⬇️
... and 2 more

... and 13 files with indirect coverage changes

Impacted file tree graph

message. Allows quick correlation of events that
previously were either not logged or, if logged, strewn
across multiple lines, making correlation difficult.
The Heartbeat Timer and consensus ledger accept processing
each have this capability.

Also guarantees that log entries will be written if the
node is a validator, regardless of log severity level.
Otherwise, the level of these messages is at INFO severity.
@@ -34,38 +34,55 @@ shouldCloseLedger(
std::chrono::milliseconds openTime, // Time waiting to close this ledger
std::chrono::milliseconds idleInterval,
ConsensusParms const& parms,
beast::Journal j)
beast::Journal j,
std::unique_ptr<std::stringstream> const& clog)
Copy link
Collaborator

@vlntb vlntb Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a new parameter to pass clog, should we extend beast::Journal to support streamed logging? Otherwise, we have to pass as a parameter two loggers now. It is our version of beast and we are free to modify it. We might have to switch to shared_ptr since beast::Journal is passed by value, but otherwise, it should be possible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single journal object won't work because the ConsensusLogger crosses through multiple objects and free functions, each with possibly distinct journal objects. This does not replace the existing journal objects, but adds additional logging. In some cases, existing logging could be subsumed by this, but in other cases, what's logged should stay being logged, such as warnings or errors.

{
std::string header_;
beast::Journal j_;
std::unique_ptr<std::stringstream> ss_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need std::unique_ptr if the clog is passed by reference everywhere?
I don't see a need for transfer of ownership.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unique pointer is mainly so the CLOG macro (in https://github.com/XRPLF/rippled/pull/5302/files#diff-e6faaf2ceab5905a3aeb3755b92c06f0ee967f612208b2b1f10b5dd963a574bf) can be used to short-circuit if there's no object. This avoids string processing of <<. This is the same idea as the JLOG macro (which is right above CLOG).

Some of the functions are called for reasons other than the heartbeat timer, which is another reason the unique pointer could be empty. In this case, it probably makes sense to log other code paths, not just the 2 I included (heartbeat timer and doAccept). There are other consensus activities that aren't included in this currently, but I think they're the most essential as a starting point. I want to be careful not to make this into some kind of "log everything" project, and stick instead with the most essential. I think if more things are identified as useful that they can be included, and remove the non-useful. But mainly it's a starting point for trouble-shooting, not necessarily a guarantee that it identifies all possible problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants